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METHOD AND SYSTEM FOR TRANSMITTING AND RECEIVING 
VOICE PACKETS OVER A COMMUNICATIONS NETWORK 



Background of the Invention 

Before voice can be transmitted over a network, the voice 
must be sampled and encoded to produce data that represents 
speech. Adaptive multi-rate (AMR) speech codecs represent a new 
generation of coding algorithms that are designed to work with 
inaccurate transport channels, such as wireless transmission 
channels. The AMR speech codec has built-in mechanisms that 
make it tolerant to a certain level of bit errors introduced by the 
transport channel. It is designed to restore the original speech, 
with some degradation, even though the coded speech is received 
with some bit errors. 

In most Internet Protocol (IP) networks, precise data 
transportation is the norm, and whenever bit errors are detected in 
the data being transported, the data is discarded. Usually, the 
transport protocol (e.g., User Datagram Protocol (UDP) or 
Transmission Control Protocol (TCP)) performs the bit error 
checking and drops packets that are found with errors. 

When transmitting voice data over an IP network the quality 
of speech at the receiving end may be degraded when network 
congestion causes voice data packets to be lost or discarded in the 
network. When the IP network encounters congestion, some routers 
between a voice packet sender and a voice packet receiver may 
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receive more data packets than they can timely forward on to their 
neighboring routers, This will cause the congested router to 
randomly drop some data packets, which may include the voice 
packets from the voice packet sender. 

Therefore, it should be apparent that there remains a need 
for an improved methoding system for transmitting and receiving 
voice packets over a communications network. Once congestion on 
the network has been detected, the improved method and system 
should attempt to alleviate the congestion to reduce the number of 
dropped packets while mamtaining voice quality. 
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Brief Description of the Drawings 

The novel features believed characteristic of the invention are 
set forth in the appended claims. The invention itself, however, as 
well as a preferred mode of use, further objects, and advantages 
5 thereof, will best be understood by reference to the following 
detailed description of an illustrative embodiment when read in 
conjunction with the accompanying drawings, wherein: 

FIG. 1 illustrates a communications network for transmitting 
voice packets from a packet sender to a packet receiver in 
10 accordance with the method and system of the present invention; 

FIG. 2 is a high-level block diagram of a voice packet 
transceiver in accordance with the method and system of the 
present invention; 

FIG. 3 is a high-level logic flow chart that illustrates the 
is method and operation of receiving a voice packet in accordance with 
the method and system of the present invention; 

FIG. 4 is a high-level logic flow chart that iUustrates the 
method and operation of transmitting a voice packet in accordance 
with the method and system of the present invention; and 

20 FIG. 5 is a more detailed representation of a voice packet in 

accordance with the method and system of the present invention. 
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Detailed Description of the Invention 

With reference now to FIG. 1, there is depicted a 
communications network for transmitting and receiving voice data 
contained in voice packets. As illustrated, communications network 
5 20 includes packet sender 22 and packet receiver 24 that 
communicate with one another through IP link 26. IP link 26 is 
preferably implemented with a network running internet protocol 
(IP) to route a data packet from its source, such as packet sender 22, 
to its destination, such as packet receiver 24. 

io In order to convert speech into data, packet sender 22 

includes multi-rate speech encoder 28. Multi-rate speech encoder 
28 is preferably implemented with an adaptive multi-rate (AMR) 
speech coder that is capable of encoding speech bits in a plurality of 
modes, wherein each mode encodes a different number of speech 

15 bits for the same speech input signal. AMR speech coders are more 
completely described in an article entitled "AMR Speech Codec; 
General Description (3G TS 26.071 Version 3.0.1)," published by 3 rd 
Generation Partnership Project (3GPP), June 2000. 

Encoding rate control 30 in packet sender 22 controls the rate 
20 at which multi-rate speech coder 28 encodes speech. Encoding rate 
control 30 determines an encoding rate based in part upon "change 
encoding rate messages" sent from packet receiver 24 to packet 
sender 22 through IP link 26. In a preferred embodiment, these 
messages request either an increase in speech encoding rate or a 
25 decrease in speech encoding rate. Alternately, these messages may 
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request a specific encoding rate. The change encoding rate message 
may also be referred to as a "mode request" message. 

In order to generate a change encoding rate message, packet 
receiver 24 includes packet loss monitor 32, which determines 

5 whether or not a packet is missing in packet receiver 24, and 
determines a packet loss rate that indicates a number of packets 
that have been lost in a selected period of time. Change encoding 
rate requests are sent from packet receiver 24 in response to the 
packet loss rate exceeding, or falling below, an upper or lower 

o threshold, respectively. 

As speech packets travel through IP link 26, network 
congestion_a condition similar to rush hour traffic on city streets- 
may result in some speech packets being dropped. Congestion 
happens when a network router is overrun with incoming traffic. 
« That is, when data packets arrive before previous packets have been 
forwarded, the router will have to provide temporary storage to hold 
them for later forwarding. When too much data in too many data 
packets arrive before previous data is forwarded, the router may 
run out of memory, and the router will discard additional packets 
20 that arrive during the overflow condition. Packet loss monitor 32 
examines serial numbers attached to each packet and determines 
whether .or not a packet is missing at the receiver. 

With reference to FIG. 2, there is depicted a high-level block 
diagram of a voice packet transceiver in accordance with the method 
25 and system of the present invention. As illustrated, voice packet 
transceiver 50 includes adaptive multi-rate (AMR) speech codec 52, 
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which receives speech input 54 and produces speech output 56. If 
voice packet transceiver 50 is used, for example, in a cellular 
telephone, speech input 54 may come from a microphone, and 
speech output 56 may be sent to a speaker. If voice packet 
transceiver 50 is used in cellular communications system 
infrastructure, for example in a base station, speech input 54 and 
speech output 56 may be coupled to the public switched telephone 
network (PSTN). 

As speech input 54 is encoded, AMR speech codec 52 produces 
encoded voice bits 58, which are then used to form an encoded voice 
bit payload 60 portion of adaptive voice rate packet 62. 

Encoded voice bits 58 are also coupled to CRC generator 64, 
which generates a CRC for adaptive rate voice packet 62. The CRC 
is put into an encoded voice bit header 66 portion of adaptive rate 
voice packet 62. 

In order to control the encoding and decoding rate of AMR 
speech codec 52, encode rate 68 and decode rate 70 are input into 
the codec. The values of encode rate 68 and decode rate 70 are 
determined by rate controller 72. 

To produce speech output 56, AMR speech codec 52 receives 
adaptive rate voice packet 74, which includes encoded voice payload 
bits 76 and encoded voice bit header 78. As shown, encoded voice 
bits 84 from encoded voice bit payload 76 are input into AMR speech 
codec 52. Additional information, such as encoding rate 
25 information, comes from encoded voice bit header 78 to control the 
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decode rate for that fame. Decode rate information 80 is shown 
coupled to rate controller 72. 

Error detector 82 receives speech bits from encoded voice bit 
payload 76, and a speech bit CRC from encoded voice bit header 78. 
5 Error detector 82 then calcinates a CRC using encoded voice bits 84, 
and compares the calculated CRC with the CRC from encoded voice 
bit header 78. Error detector 82 is coupled to AMR speech codec 52 
so that the codec may be informed that a speech bit error has been 
detected. 

10 According to an important aspect of the present invention, 

encoded voice bit header 78 includes sequence number 86 that 
indicates the order in which encoded voice bits 76 were encoded. 
This order is important because the speech must be decoded in the 
same frame order used when it was encoded. 

15 Sequence number 86 is coupled to packet loss monitor 32, 

which is part of the function of rate controller 72. As discussed 
previously with reference to FIG. 1, packet loss monitor 32 
computes a packet loss rate that may represent a number of packets 
lost over a selected period of time. Alternatively, the packet loss 

20 rate may be calculated as a percentage of packets lost out of all the 
packets sent. It is assumed that the packet loss rate determined by 
packet loss monitor 32 is relative to the amount of network 
congestion in the network connected to voice packet transceiver 50. 

Rate controller 72 generates information that goes into 
25 adaptive rate voice packet 62. Such information includes encode 
rate 88, which goes into encoded voice bit header 66 to indicate the 



PCT/USOl/42468 

WO 02/30098 



rate at which encoded voice bit payload 60 as been encoded. Rate 
controller 72 also generates a change encoding rate message 90, 
which is placed in mode request field 92 in encoded voice bit payload 
60. Change encoding rate message 90 is used to request that a 
remote voice coder encode packets at a different encoding rate. 

It is important to note that sequence number 86 in an 
incoming adaptive rate voice packet 74 is used to generate a change 
encoding rate message 90 in order to alleviate congestion on a 
heavily congested network, or, alternatively, to take advantage of 
the bandwidth available on a lightly congested network. If all voice 
packet transceivers 50 connected to a network request a lower voice 
encoding rate when heavy network congestion is detected, the 
network congestion may be alleviated. And, when the network is 
lightly congested, voice packet transceivers 50 may request higher 
encoding rates. 

According to another aspect of the present invention, change 
encoding rate message 90 is placed in mode request field 92 in 
encoded voice bit payload 60, rather than being placed in encoded 
voice bit header 66. This is important when encoded voice bit 

o header 66 is an error intolerant portion of adaptive rate voice packet 
62 and encoded voice bit payload 60 is an error tolerant portion of 
the voice packet. By placing mode request field 92 in the payload 
portion of the packet, any errors in change encoding rate message 
90 will not be detected by the transport layer of the network, which 

25 would cause the transport layer to discard the packet. When using 
the present invention, errors in the change encoding rate message 
90 will not cause the transport layer to discard the packet. 
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Just as outgoing adaptive rate voice packet 62 includes mode 
request field 92, incoming adaptive rate voice packet 74 may include 
mode request field 94 that includes a change encoding rate message 
96 that requests a change in encode rate 68. Change encoding rate 
5 message 96 may include parity information, which is checked by 
parity checker 98. 

With reference now to FIG. 3, there is a depicted high-level 
logic flow chart that illustrates the method and operation of 
receiving a voice packet in accordance with the method and system 

,o of the present invention. As illustrated, the process begins at block 
200, and thereafter passes to block 202 wherein the process receives 
packets from a remote voice coder via the network. The remote 
voice coder is capable of encoding speech bits at various encoding 
rates. The remote voice coder is also able to receive messages 

15 requesting an increase or decrease in voice encoding rates. 

Next, the process detects a packet loss rate by examining 
sequence numbers of received packets, as depicted at block 204. To 
• detect a packet loss rate, the process may examine sequence 
numbers of received packets and detect that some packets are 
2 o missing. A packet loss rate represents a number of packets lost in a 
predetermined period of time, or alternatively, a percentage of 
packets lost, such as, 10 out of 100 sent. 

After detecting a packet loss rate, the process determines 
whether or not the packet loss rate exceeds an upper threshold, as 
25 illustrated in block 206. The upper threshold should be set at a rate 
that is likely to indicate that the network is heavily congested to a 
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point that the quality of real-time voice communication is likely to 
fall below an acceptable level. 

If the packet loss rate exceeds an upper threshold, the process 
sends a change encoding rate message to the remote voice coder to 
request a decrease in the voice coding rate, as depicted in block 208. 
The decrease in voice coding rate should decrease the number of bits 
in each of the future frames sent by the remote voice coder. A 
decreased number of bits in future frames should contribute ' to 
lowering the congestion level of the network. When packets become 
smaller, the total amount of network traffic will be reduced, which, 
in turn, will reduce the amount of memory storage required in the 
congested router, and the router will then become less likely to drop 
future packets. 

After the process sends the decrease coding rate message, the 
process iteratively returns to block 202 to receive additional 
packets. 

With reference again to block 206, if the packet loss rate does 
not exceed an upper threshold, the process then determines whether 
or not the packet loss rate has fallen below a lower threshold, as 

3 illustrated at block 210. The lower threshold should be selected to 
coincide with network congestion falling to a level that would 
support the transmission of additional voice packets at an 
acceptable packet loss rate, and hence, a higher voice coding rate 
would be supported. In some embodiments of the present invention, 

„ the upper and lower thresholds may be set to the same value. 
However, in a preferred embodiment, the upper and lower 
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thresholds are spaced apart to add some hysteresis, or delay, in the 
sending of the change encoding rate message. 

If the packet loss rate has fallen below the lower threshold, 
the process sends a change encoding rate message to the remote 
5 voice coder to request an increase the voice coding rate, as depicted 
at block 212. Such an increase in the voice coding rate will increase 
the voice quality at the receiver, and take advantage of the fact that 
the network congestion as fallen to a lower level. 

If, at block 210, the packet loss rate has not fallen below a 
io lower threshold, the process iterately returns to block 202 to receive 
the next packet. 

As can be seen from the flow chart in FIG. 3, network 
congestion is detected by calculating a packet loss rate and 
determining whether or not that rate exceeds a threshold. 
15 Messages to change encoding rates at a remote voice coder are sent 
from the receiver, depending upon the assumed level of network 
congestion and thresholds set in the receiver. 

With reference now to FIG. 4, there is depicted a high-level 
logic flow chart that illustrates the method and operation of 
20 transmitting a voice packet in accordance with the method and 
system of the present invention. As shown, the process begins at 
block 300, and thereafter passes to block 302 wherein the process 
encodes a frame of voice bits. Next, the process generates a CRC for 
the encoded voice bits, as illustrated at block 304. 
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After generating the CRC, the process generates a change 
encoding rate message to request a change of the voice encoding 
rate in a remote voice coder, as depicted at block 306. The change 
encoding rate message requests either an increase in coding rate or 

5 decrease in coding rate, as determined by a packet loss rate at the 
remote voice coder, which is described more completely in relation 
to FIG. 3. The process may also generate parity information that 
will be added to mode request field 92. At the receiving codec, the 
parity information can be used to verify the integrity of mode 

,o request field 92. If the parity check fails, the codec will ignore the 
change encoding rate message 90, but the codec will still decode the 
received speech bits normally. 

After generating the change encoding rate message, the 
process generates an error tolerant portion of an encoded voice 

15 packet, wherein the error tolerant portion includes the change 
encoding rate message and encoded voice bits, as illustrated at block 
308. An error tolerant portion of a packet is one that will not be 
used by the transport layer for determining whether or not to 
discard the packet. In other words, the transport layer will not 

20 perform a checksum-type check on the error tolerant portion to 
determine whether or not that portion should be passed on through 
the network. 

Referring to FIG. 5, there is depicted a more detailed 
representation of a voice packet in accordance with the method and 
25 system of the present invention. As illustrated, adaptive rate voice 
packet 62 includes error tolerant portion 100 that includes change 
encoding rate message 102, and encoded speech bits 104. 
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After generating an error tolerant portion of the voice packet, 
the process generates an error intolerant portion of an encoded voice 
packet, as depicted at block 310 of FIG. 4. The error intolerant 
portion includes a packet sequence number and a CRC for the 
5 encoded voice bits. The error intolerant portion of the encoded voice 
packet is a portion that is used by the transport layer for 
determining whether or not to discard the packet. In FIG. 5, error 
intolerant portion 106 includes real time protocol (RTP) header 108 
and AMR frame header 110. 

,o RTP header 108 includes sequence number 86, which teUs the 

receiving codec the order in which to perform frame decoding, and 
payload type 114, which is used to identify the payload in the RTP 
as an AMR payload. 

AMR frame header 110 includes frame coding rate 
15 information 116, which tells the receiving codec the rate to perform 
speech decoding. Also contained in AMR frame header 110 is 
speech bit CRC 118, which is used to determine whether or not 
speech bits 104 contain an error. For more information about RTP 
protocol, see the article entitled "RTP: A Transport Protocol for 
20 Real-Time Applications," RFC1889, published by Internet 
Engineering Task Force, Jan. 1996. 

After generating the error intolerant portion of the packet, 
the process forms an encoded voice packet by concatenating a voice 
packet header, the error intolerant portion, and the error tolerant 
25 portion, as illustrated at block 312. FIG. 5 shows adaptive rate 
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voice packet 62, which is formed by UDP header 112, error 
intolerant portion 106, and error tolerant portion 100. 

UDP header 112 includes UDP partial checksum 120, which 
is used by the transport layer to check error intolerant portion 106 
for errors, and if an error is detected, the transport layer will 
discard the packet. Note that only part of the packet is checked for 
errors by the transport layer in deciding to discard a packet-terror 
tolerant portion 100 is not checked for errors by the transport layer. 
For more information regarding UDP, see the article entitled "User 
Datagram Protocol", which is RFC768 published by Internet 
Engineering Task Force (IETF) August 1980. 

In current wireless cellular communications system design, 
referred to as a 3GPP system, an 8-bit CRC field is used for 
detecting errors in a more sensitive portion of the speech bits in a 
voice packet— a portion referred to as Class A bits in an AMR frame. 
But this CRC field is only added by the radio transmitter before the 
AMR frame is sent over the air link, and it is removed by the radio 
receiver right after the frame is received from the air link. In other 
words, the CRC mechanism, as defined in 3GPP, is only applied to 
the over-the-air link, and is not available for use in any of the other 
links of the voice over IP (VoIP) connection. 

According to another aspect of the present invention, an 8-bit 
CRC field is added to the AMR frame format as a permanent field— 
a field that will remain in the frame all the way to the AMR 
i decoder, where it will be examined by the decoder. The AMR 
encoder will generate the CRC, rather than being generated by the 
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radio transmitter in the middle of the connection, and the CEC will 
then be used by the receiving AMR decoder, rather than being 
removed by the radio receiver in the middle of the connection. The 
advantages of this method and system include: j) simplifying the 
wireless transport layer at the transmitter and receiver in the radio 
link because the wireless transport layer will no longer need to 
understand the format of the payload frame (in the prior art 3GPP 
approach the transport layer needs to know where to find the Class 
A bits, etc): 2) error checking the sensitive Class A bits all the way 
through the entire connection. 



The foregoing description of a preferred embodiment of the 
invention has been presented for the purpose of illustration and 
description. It is not intended to be exhaustive or to limit the 
invention to the precise form disclosed. Obvious modifications or 
variations are possible in light of the above teachings. The 
embodiment was chosen and described to provide the best 
illustration of the principles of the invention and its practical 
application, and to enable one of ordinary skill in the art to utikze 
the invention in various embodiments and with various 
modifications as are suited to the particular use contemplated. AU 
such modifications and variations are within the scope of the 
invention as determined by the appended claims when interpreted 
in accordance with the breadth to which they are fairly, legally, and 
equitably entitled. 
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Claims 

What is claimed is'- 



voice 



1. A method in a network for improving transmitted 
quality, the method comprising the steps of: 



receiving packets from a voice coder, wherein the packets 
have been encoded at a higher encoding rate; 

detecting a packet loss rate that exceeds a threshold loss 



sending a reduce encoding rate message to the voice coder to 
cause the voice coder to encode the packets at a lower 
encoding rate. 

2. The method for improving transmitted voice quality 
according to claim 1 wherein the step of sending a reduce 
encoding rate message -to the voice coder further includes 
sending a reduce encoding rate message to the voice coder in an 
error tolerant portion of a packet, wherein the error tolerant 
portion is not used by the transport layer for determining 
whether or not to discard the packet. 
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3. The method for improving transmitted voice quality 
according to claim 2 wherein the step of sending a reduce 
encoding rate message to the voice coder in an error tolerant 
portion of a packet further includes sending a reduce encoding 
rate message and message parity information to the voice coder 
in an error tolerant portion of a packet, wherein the message 
parity information is generated from the reduce encoding rate 
message. 



4. The method for improving transmitted voice quality 
10 according to claim 1 further includes verifying the integrity of 

selected voice bits in a received packet by calculating a cyclic 
redundancy check value for the selected voice bits and 
comparing the cyclic redundancy check value to an included 
cyclic redundancy check value that was put in the received 
15 packet at the voice coder. 

5. The method for improving transmitted voice quality 
according to claim 4 wherein the selected voice bits are class A 
voice bits in an adaptive multi-rate encoded frame. 



6. The method for improving transmitted voice quality 
according to claim 1 wherein the step of receiving packets from 
a voice coder further includes receiving packets of adaptive 
.multi-rate encoded voice bits from a voice coder. 
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7. A system in a network for improving transmitted 
voice quality comprising: 

means for receiving packets from a voice coder, wherein the 
packets have been encoded at a higher encoding rate; 

means for detecting a packet loss rate that exceeds a 
threshold loss rate; 

means for sending a reduce encoding rate message to the 
voice coder to cause the voice coder to encode the packets 
at a lower encoding rate. 



10 8. The system for improving transmitted voice quality 

according to claim 7 wherein the means for sending a reduce 
encoding rate message to the voice coder further includes 
means for sending a reduce encoding rate message to the voice 
coder in an error tolerant portion of a packet, wherein the error 

15 tolerant portion is not used by the transport layer for 
determining whether or not to discard the packet. 
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9. The system for improving transmitted voice quality 
according to claim 8 wherein the means for sending a reduce 
encoding rate message to the voice coder in an error tolerant 
portion of a packet further includes means for sending a reduce 
encoding rate message and message parity information to the 
voice coder in an error tolerant portion of a packet, wherein the 
message parity information is generated from the reduce 
encoding rate message. 

lO. The system for improving transmitted voice quality 
according to claim 7 further includes means for verifying the 
integrity of selected voice bits in a received packet by calculating 
a cyclic redundancy check value for the selected voice bits and 
comparing the cyclic redundancy check value to an included 
cyclic redundancy check value that was put in the received 
packet at the voice coder. 
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